Qualify C-style memory allocation functions as pure#1683
Qualify C-style memory allocation functions as pure#1683andralex merged 1 commit intodlang:masterfrom
Conversation
|
cc @MartinNowak @WalterBright to make sure we're on the same page. The faiure is caused by |
8a64a69 to
2f0c852
Compare
|
Pushed again. |
|
If you log in to the auto tester, you can just deprecate the bad test |
|
BTW: Shouldn't we qualify |
I don't think so. void foo(immutable T* ptr) pure
{
...
free(cast()ptr);
...
}The cast is likely to happen because "I know what I'm doing, and this needs to be freed at this point". It may even be in a destructor, done without the author's knowledge (think reference counting). But look at the signature of |
|
It's an interesting question. We could classify |
This could be a possibility. But then aren't we reducing the usefulness of pure functions? It would be quite easy to make a template that gets inferred as pure in some cases, and then I would want the compiler to optimize away needless calls. |
|
@schveiguy if it returns void and is pure... that's either a bug or a feature, but not a call you want optimized away :o). I know, it's zen. |
|
For instance: void popFrontN(R)(ref R range, size_t n) // inferred pure when range.popFront is pure
{
foreach(i; 0 .. n) range.popFront;
}
struct Repeat(T)
{
T val;
T front() const { return val; }
void popFront() const {} // inferred pure
enum empty = false;
}
immutable foo = Repeat!int(5);
foo.popFrontN(1_000_000); // please elide thisSure, a contrived example. But these kinds of things happen all the time with generic code. I'm expecting the compiler to elide the calls that I would have if I wrote it by hand instead of using a generic template. |
|
@schveiguy I did agree with this but the more I think about it, the more I think that the compiler eliding function calls based on critieria that are inferred is a terrible, terrible idea. It's just ripe for wrongly-elided calls and I'm surprised we haven't had more reports of it happening already (probably because much of Phobos is impure). |
|
@schveiguy your code example with cast is not allowed in @safe mode. What about fixing DMD so that it only performs these optimizations on calls to @WalterBright perhaps this already works in this way? If so I guess a unittest is inplace to verify that DMD does not optimize away the second call to a Update: What about the purity of |
|
Ping, @andralex ! Should I add a unitttest? |
|
Not this again. |
|
@ibuclaw what do you mean? |
|
@ibuclaw we've gotten to a better understanding of the matters involved and fixed a bug in the compiler such that |
|
Auto-merge toggled on |
2f0c852 to
d9b9724
Compare
|
I'm really starting to lose track of what |
This argumentation works until you start peeling layers. The elided call doesn't have to be an inferred pure one. It's also not practical -- I may use a template because I want to write my code once for many types. I don't want to have to repeat this identical code in a pure and non-pure template, just because I want to have efficient generated code. In any case, my dire prediction is that marking @JackStouffer The explanation of D pure is simple. The benefits the optimizer enjoys are what is complex. And the user doesn't generally have to be aware of this, they just know adding |
|
This really feels like cheating that will come back to bite us one day. I can see a DIP being submitted within a year to create a
I'm genuinely asking, what on Earth does "effectively pure" even mean? Conceptually what is the difference between making The edge cases are always the most important focus when creating a language feature because the simple stuff almost always will work. It's the edge cases which can shoot you in the foot. So when your function call is only pure normally and is impure in the edge cases, that's an impure function.
Let's try to write one with this change in mind: A pure function is a function which does not read or modify anything outside of it's scope such as reading a global variable or doing IO. But modifying function parameters which are taken by Short version: This is a C++ style explaination if I ever saw one. I don't think this change is particularly bad, it's just this is another step in a trend in D of features that no one really knows how they work. See |
|
Maybe a stupid question, but could somebody please explain why void* malloc(size_t size) pure;is weakly (and not strongly) pure? AFAIR a method is only weakly pure if any parameter can cause side-effects (i.e. one parameter needs to have indirections and be mutable). Or does
mean you've hardcoded a list of such functions in the compiler? |
|
@JackStouffer, But I agree on one point; the definition of weakly pure has to be extended to nullary functions that return non-immutable references. |
d9b9724 to
65d9a07
Compare
|
@jpf91 See discussion and explanations here: https://issues.dlang.org/show_bug.cgi?id=15862 |
If you imagine, the pure world-view is that before something is created, it never existed. However, practically, it must already exist (a computer is a finite system, and everything requested by a program exists before it's requested). So we need a place to cheat, but still present an effectively pure view of the world. It means that you cannot see the difference between receiving a piece of data that was once part of a global pool (e.g. visible only by the GC), and one that never existed before. Think of pure-driven memoization. Where is the memoized data stored? Has to be in a global pool, right? But yet, the function that uses it is pure, because the caller cannot tell the difference. In your file example, I would say creating a NEW file that is locked and/or cannot be seen by any others is "effectively" pure (and so will all i/o for it be pure), but opening a file that exists or opening a file that others can open, etc. cannot be pure. |
|
@schveiguy thanks. I think we really need some updated strong/weak pure documentation though, AFAIK the official reference on dlang.org does not even mention weak vs strong pure. |
And it doesn't have to. The implications of weak vs. strong pure are compiler optimization details. Only the rules need to be described. That being said, I think we should describe and define the terms somewhere on the site. |
|
@nordlow would you consider updating the docs please? |
|
I'll give it a try, @andralex |
|
On 11/01/2016 05:54 PM, Jack Stouffer wrote:
Can you elaborate on how you consider calloc doing IO? And calloc can
(Note: In the following I don't consider memory allocations to do IO. Naively: A Being pedantic, that excludes any kind of memory allocation, including So maybe: A Then there would be a paragraph about how the compiler must reject any This is getting rather complicated, for sure. And I probably missed The thing is that GC.malloc has been
I'm afraid |
|
@JackStouffer @aG0aep6G You're both complicating The only two aspects of it that are arguably odd at all are that memory allocation is permitted and that All of the difficulty in marking stuff like C functions as Now, if you start worrying about exactly when optimizations can be made based on |
(Assuming you mean "that the function cannot access [...]".) I'm afraid that's not true when GC.malloc is
Depends on how "state" is defined, I guess. You can make output without accessing global variables, as you don't need them to make a syscall. Maybe that still involves touching global "state", but then we have to define "state" to include IO but not memory allocation. So we have consider IO at some point, no?
I agree that
Ok, now you're adding an exception to your "very simple" rule, making it not so simple anymore. I'm not opposed to doing it that way. State a simple base rule and then the exceptions to it; fine. But GC.malloc doesn't just allocate memory from the OS, it also has to do book-keeping via mutable globals. In the end, "state" needs to be carefully defined if we want a |
Update according to dlang/druntime#1683 I'm not sure if this is enough or if I should update the general formulation of purity aswell. Made an existing statement more clear via a neither-nor formulation.
|
@andralex Added dlang/dlang.org#1510 Do we need to extend the general explanation of purity aswell? If so, I'd be happy to receive proposals or just a brief summary of what to include. |
I/O either involves call non- Memory allocation is more of a grey area, but the functions in question are marked as It's even normally simple to figure out whether a C function can be marked as There's no question that understanding what the compiler does with |
|
On 11/02/2016 04:20 AM, Jonathan M Davis wrote:
You can also make output by making the syscall in assembler: void main()
{
auto message = "Hello, world!\n";
auto length = message.length;
auto pointer = message.ptr;
asm
{
// linux x86_64
mov RAX, 1; // write
mov RDI, 1; // stdout
mov RSI, pointer;
mov RDX, length;
syscall;
}
}The code doesn't touch any globals. So if that was the only requirement,
You're looking at it from the perspective of a D programmer, but we also If the guarantee/requirement is just that a
Since there is no specification of dmd's additional guarantees, relying |
It's clear from the function signature that optimizing away The spec isn't even vaguely precise enough to write a compiler from without looking at what dmd does, and yes, druntime is tied to the dmd front-end. Long term, we definitely want a spec that's precise enough to write a compiler from without looking at what dmd does, and Andrei was talking at dconf about looking at doing that, but it hasn't happened yet. Realistically, the definition of D is a combination of the online docs, what dmd actually does, and what Walter says. And no, that's not ideal, but that's reality at the moment. Work has been done to improve the online spec, and there are at least two projects working on implementing a D compiler based on the spec, which has led to further improvements of the spec, but until someone with the right skillset actually writes a formal spec that Walter agrees to, we don't really have one. We have online documentation that we call a spec, but it's way too informal to function as one, and it doesn't come even close to having the level of detail that would be needed for a true spec. |
|
On 11/02/2016 09:40 AM, Jonathan M Davis wrote:
In addition to a better spec, we also want to decouple druntime from |
|
@nordlow generally the more and detailed docs the better. In a way the good info in this PR will be wasted if not immortalized in the form of documentation. @nordlow would be awesome if you acted as a curator. @jmdavis is basically right: yes, a |
Do I assume right that you are looking for a way to reduce dead GC and stdlib memory allocations? If so, I agree that we should, However I'm not sure about using |
|
@ibuclaw If we can do memory allocation in |
|
@jmdavis - I was thinking about from the optimizer point of view. But for me it is easier to reason about a GC new operation as being pure. A variable that is unused apart from its initial GC assignment to me stands out as an opportunity to eliminate the call entirely. If memory returned from the GC isn't set or read, it will just be recycled anyway. An typical example I used to see in gdc until fixed was when a function that creates a closure gets inlined, then const-folded away. However the apart from the return result, you're also left with a GC malloc call to initialize the now unused closure pointer. The latter was never discarded because the backend assumed the worst about any side effects such a call may have. |
Late to the party, but since |
|
@nemanja-boric-sociomantic See dlang/dlang.org#1528, which requires malloc to be called (no memoization). Then we need to figure what the impact of it setting a global is; I'm unclear on that right now. |
In response to dlang/dmd#6197
Why does the FreeBSD_32 target fail?